skip to main content
10.1145/1878101.1878107acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Direct posterior confidence for out-of-vocabulary spoken term detection

Published:29 October 2010Publication History

ABSTRACT

Spoken term detection (STD) is a fundamental task in spoken information retrieval. Compared to conventional speech transcription and keyword spotting, STD is an open-vocabul-ary task and is necessarily required to address out-of-vocabulary (OOV) terms. Approaches based on subword units, e.g. phonemes, are widely used to solve the OOV issue; however, performance on OOV terms is still significantly inferior to that for in-vocabulary (INV) terms.

The performance degradation on OOV terms can be attributed to a multitude of factors. A particular factor we address in this paper is that the acoustic and language models used for speech transcribing are highly vulnerable to OOV terms, which leads to unreliable confidence measures and error-prone detections.

A direct posterior confidence measure that is derived from discriminative models has been proposed for STD. In this paper, we utilize this technique to tackle the weakness of OOV terms in confidence estimation. Neither acoustic models nor language models being included in the computation, the new confidence avoids the weak modeling problem with OOV terms. Our experiments, set up on multi-party meeting speech which is highly spontaneous and conversational, demonstrate that the proposed technique improves STD performance on OOV terms significantly; when combined with conventional lattice-based confidence, a significant improvement in performance is obtained on both INVs and OOVs. Furthermore, the new confidence measure technique can be combined together with other advanced techniques for OOV treatment, such as stochastic pronunciation modeling and term-dependent confidence discrimination, which leads to an integrated solution for OOV STD with greatly improved performance.

References

  1. M. Akbacak, D. Vergyri, and A. Stolcke. "Open-vocabulary spoken term detection using graphone-based hybrid recognition systems". In Proc. ICASSP'08, pages 5240--5243, Las Vegas, Nevada, USA, March 2008.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. Can, E. Cooper, A. Sethy, C. White, B. Ramabhadran, and M. Saraclar. "Effect of pronunciations on OOV queries in spoken term detection". In Proc. ICASSP'09, pages 3957--3960, Taipei, Taiwan, April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C.-C. Chang and C.-J. Lin. "LIBSVM: A library for support vector machines", 2001.Google ScholarGoogle Scholar
  4. S. Deligne, F. Yvon, and F. Bimbot. "Variable-length sequence matching for phonetic transcription using joint multigrams". In Proc. Eurospeech'95, pages 2243--2246, Madrid, Spain, September 1995.Google ScholarGoogle Scholar
  5. T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan. "The AMI meeting transcription system: Progress and performance". In Machine Learning for Multimodal Interaction, volume 4299/2006, pages 419--431. Springer Berlin/Heidelberg, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Hermansky, D. P. Ellis, and S. Sharma. "Tandem connectionist feature extraction for conventional HMM systems". In Proc. ICASSP'00, pages 1635--1638, Istanbul, Turkey, June 2000.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Mamou and B. Ramabhadran. "Phonetic query expansion for spoken document retrieval". In Proc. Interspeech'08, pages 2106--2109, Brisbane, Australia, September 2008.Google ScholarGoogle Scholar
  8. NIST. "The spoken term detection (STD) 2006 evaluation plan". National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA, 10 edition, September 2006.Google ScholarGoogle Scholar
  9. I. Szoke, M. Fapso, L. Burget, and J. Cernock "Hybrid word-subword decoding for spoken term detection". In Proc. Speech search workshop at SIGIR (SSCS'08), Singapore, 2008. Association for Computing Machinery.Google ScholarGoogle Scholar
  10. D. Vergyri, I. Shafran, A. Stolcke, R. R. Gadde, M. Akbacak, B. Roark, and W. Wang. "The SRI/OGI 2006 spoken term detection system". In Proc. Interspeech'07, pages 2393--2396, Antwerp, Belgium, August 2007.Google ScholarGoogle Scholar
  11. D. Wang, S. King, and J. Frankel. "Stochastic pronunciation modelling for spoken term detection". In Proc. Interspeech'09, pages 2135--2138, Brighton, UK, September 2009.Google ScholarGoogle Scholar
  12. D. Wang, S. King, J. Frankel, and P. Bell. "Term-dependent confidence for out-of-vocabulary term detection". In Proc. Interspeech'09, pages 2139--2142, Brighton, UK, September 2009.Google ScholarGoogle Scholar
  13. D. Wang, J. Tejedor, J. Frankel, and S. King. "Posterior-based confidence measures for spoken term detection". In Proc. ICASSP'09, pages 4889--4892, Taiwan, April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Direct posterior confidence for out-of-vocabulary spoken term detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
      October 2010
      72 pages
      ISBN:9781450301626
      DOI:10.1145/1878101

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader